15 research outputs found

    Anatomical Structure Sketcher for Cephalograms by Bimodal Deep Learning

    Full text link
    The lateral cephalogram is a commonly used medium to acquire patient-specific morphology for diagnose and treatment planning in clinical dentistry. The robust anatomical structure detection and accurate annotation remain challenging considering the personal skeletal variations and image blurs caused by device-specific projection magnification, together with structure overlapping in the lateral cephalograms. We propose a novel cephalogram sketcher system, where the contour extraction of anatomical structures is formulated as a cross-modal morphology transfer from regular image patches to arbitrary curves. Specifically, the image patches of structures of interest are located by a hierarchical pictorial model. The automatic contour sketcher converts the image patch to a morphable boundary curve via a bimodal deep Boltzmann machine. The deep machine learns a joint representation of patch textures and contours, and forms a path from one modality (patches) to the other (contours). Thus, the sketcher can infer the contours by alternating Gibbs sampling along the path in a manner similar to the data completion. The proposed method is robust not only to structure detection, but also tends to produce accurate structure shapes and landmarks even in blurry X-ray images. The experiments performed on clinically captured cephalograms demonstrate the effectiveness of our method.http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000346352700099&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=8e1609b174ce4e31116a60747a720701Computer Science, Artificial IntelligenceCPCI-S(ISTP)

    Transferring of Speech Movements from Video to 3D Face Space

    Full text link

    Enhanced Random Forest with Image/Patch-Level Learning for Image Understanding

    Full text link
    Image understanding is an important research domain in the computer vision due to its wide real-world applications. For an image understanding framework that uses the Bag-of-Words model representation, the visual codebook is an essential part. Random forest (RF) as a tree-structure discriminative codebook has been a popular choice. However, the performance of the RF can be degraded if the local patch labels are poorly assigned. In this paper, we tackle this problem by a novel way to update the RF codebook learning for a more discriminative codebook with the introduction of the soft class labels, estimated from the pLSA model based on a feedback scheme. The feedback scheme is performed on both the image and patch levels respectively, which is in contrast to the state- of-the-art RF codebook learning that focused on either image or patch level only. Experiments on 15-Scene and C-Pascal datasets had shown the effectiveness of the proposed method in image understanding task.Comment: Accepted in ICPR 2014 (Oral

    Transferring of speech movements from video to 3d face space

    No full text
    Abstract—We present a novel method for transferring speech animation recorded in low quality videos to high resolution 3D face models. The basic idea is to synthesize the animated faces by an interpolation based on a small set of 3D key face shapes which span a 3D face space. The 3D key shapes are extracted by an unsupervised learning process in 2D video space to form a set of 2D visemes which are then mapped to the 3D face space. The learning process consists of two main phases: 1) Isomap-based nonlinear dimensionality reduction to embed the video speech movements into a low-dimensional manifold and 2) K-means clustering in the lowdimensional space to extract 2D key viseme frames. Our main contribution is that we use the Isomap-based learning method to extract intrinsic geometry of the speech video space and thus to make it possible to define the 3D key viseme shapes. To do so, we need only to capture a limited number of 3D key face models by using a general 3D scanner. Moreover, we also develop a skull movement recovery method based on simple anatomical structures to enhance 3D realism in local mouth movements. Experimental results show that our method can achieve realistic 3D animation effects with a small number of 3D key face models. Index Terms—Facial animation, speech synchronization, visual speech synthesis, performance-driven animation, machine learning. Ç

    Unsupervised Random Forest Manifold Alignment for Lipreading

    No full text
    Lipreading from visual channels remains a challenging topic considering the various speaking characteristics. In this paper, we address an efficient lipreading approach by investigating the unsupervised random forest manifold alignment (RFMA). The density random forest is employed to estimate affinity of patch trajectories in speaking facial videos. We propose novel criteria for node splitting to avoid the rank-deficiency in learning density forests. By virtue of the hierarchical structure of random forests, the trajectory affinities are measured efficiently, which are used to find embeddings of the speaking video clips by a graph-based algorithm. Lipreading is formulated as matching between manifolds of query and reference video clips. We employ the manifold alignment technique for matching, where the L-infinity-norm-based manifold-to-manifold distance is proposed to find the matching pairs. We apply this random forest manifold alignment technique to various video data sets captured by consumer cameras. The experiments demonstrate that lipreading can be performed effectively, and outperform state-of-the-arts.Computer Science, Artificial IntelligenceEICPCI-S(ISTP)

    An Intelligent Platooning Algorithm for Sustainable Transportation Systems in Smart Cities

    No full text
    corecore